A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora
نویسندگان
چکیده
We address the problem dealing with a large collection of data, and investigate the use of automatically constructing category hierarchy from a given set of categories to improve classification of large corpora. We use two wellknown techniques, partitioning clustering, means and a to create category hierarchy. -means is to cluster the given categories in a hierarchy. To select the proper number of , we use a which measures the degree of our disappointment in any differences between the true distribution over inputs and the learner’s prediction. Once the optimal number of is selected, for each cluster, the procedure is repeated. Our evaluation using the 1996 Reuters corpus which consists of 806,791 documents shows that automatically constructing hierarchy improves classification accuracy.
منابع مشابه
Sensitometric characteristics of D-, E- and F-speed dental radiographic films in manual and automatic processing
BACKGROUND: The purpose of this study was to evaluate the sensitometric characteristics of Ultraspeed, Ektaspeed Plus and Insight dental radiographic films using manual and automatic processing systems. METHODS: In this experimental invitro study, an aluminum step-wedge was used to construct characteristic curves for D-, E- and F-speed radiographic films (Kodak Eastman, Rochester, USA). All fil...
متن کاملمقایسۀ کاربرد انواع روشهای ارزیابی دسترسپذیری وبسایتها مطالعۀ موردی: وبسایت وزارتخانههای دولت جمهوری اسلامی ایران)
Purpose: The present research aims to comparatively study different methods for evaluating the accessibility of websites and analyze the results of case study concerning websites of ministries of Iranian government, in order to indicate the strengths, weaknesses, and differences in evaluation findings by applying each of website accessibility methods. Methodology: In this paper, initially the ...
متن کاملThe Impact of Different Frequency Patterns on the Syntactic Production of a 6-year-old EFL Home Learner: A Case Study
This longitudinal study investigated the impact of different Frequency Patterns (FP) on the syntactic production of a six-year-old EFL learner in a home context. Target syntactic constructions were presented using games and plays and were traced for their occurrence patterns in input and output. Following each instruction period, the constructions were measured through immediate and delayed ora...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملAn Automatic Fingerprint Classification Algorithm
Manual fingerprint classification algorithms are very time consuming, and usually not accurate. Fast and accurate fingerprint classification is essential to each AFIS (Automatic Fingerprint Identification System). This paper investigates a fingerprint classification algorithm that reduces the complexity and costs associated with the fingerprint identification procedure. A new structural algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004